Welcome Statistics 422

Lecture 1

Getting to know you

Late afternoon social

Let’s use this time to interact & relax before getting started

  • Please gather in small groups (minimum 3 classmates, maximum 6)

  • If you don’t know someone please introduce yourself

  • If your group sees someone without a group, please extend an invitation to join

About the Course

Instructor

Dr. Vivian Lew

  • Math Sciences 8923 (neighbor is Nicolas Christou)

  • BruinLearn e-mail is the best way to reach me

  • No discussion or TA or grader for this course

Other Course Details

  • Meet Once a Week Thursdays 6pm- 8:50pm
    • Physics and Astronomy Building 2748
  • Office Hours
    • M 4pm - 6pm (MS 8923) & F 3pm-5pm (Zoom)
    • Welcome to schedule Zoom appointment(s), most evenings & weekends too

What to expect

  • Format: During Class
    • Workshop style - some lecturing, some small group activities, breaks
    • Interactive - please bring your electronics (laptop, iPad, maybe phone)
  • Format: Outside of Class
    • Individual homework assignments, individual late mid-quarter project, individual final project
    • Encourage interaction with me but more important, with each other
    • Stats 422 Community interaction on Campuswire, perhaps examine the weekly data posted to Tidy Tuesday

What to expect

Developing the scaffolding needed for understanding and building on one’s existing data visualization skills.

  • Aspects
    • Translating data and its relationships visually for others.
    • Learning to structure and organize data for effective visualization.
    • Learning about the different types of visual representation and their purposes.
    • Understanding of Graphical Perception/perceived visual information and some of the cognitive principles involved.
    • Learning to generate static and interactive visualizations
    • Knowing how to present your visualizations in different media (print, website, video)

What to expect

  • Process
    • Start with an audience and an idea and find the data (not necessarily in that order)
    • Prepare the data for visualization (e.g., clean, pivot)
    • Represent the data visually with various graphical elements (e.g., color, sizes, and shape)
    • Post-process the data (e.g., decide whether interactivity would add power, post process a static visualization )
  • Tools
    • R/ggplot2, Shiny; Python/matplotlib, Streamlit; Tableau

What to expect

  • Grading
    • 30% Attending class weekly with in-class team activity
    • 20% 4 individual homework assignments
    • 5% Campuswire participation
    • 20% Web App: (individual) Find your data, Design a page, Code and Video
    • 25% Final Project: (individual) Find your own data, Report, Code and Video
  • The data for the Web App can be the same, but ideally a simple dataset like iris for the Web App and something more closely related to your thesis (so something more complex) for your final project

Questions?

Group Activity 1 (15-20 minutes)

  • If you aren’t already in a group, please join one, minimum 3 maximum 6. Please put yourself into a Week 1 group on BruinLearn to get credit for being here tonight.

  • Someone take a team photo (selfie/grelfie/usie/0.5?) and upload an annotated copy to BruinLearn for credit for attending today.

  • Let’s take 15-20 minutes to examine and discuss some graphs created by ggplot2.

  • https://r-charts.com/ggplot2/ or https://r-graph-gallery.com/ggplot2-package.html

  • Ideally, I would like you to examine either or both of these sites as a team and

  • Please choose the one plot from your group that you think is most memorable (can be good or not good), and have one team member share the graph on Campuswire and be prepared to tell us why your team chose it.

Let’s take a 10 minute break

ggplot2

Data visualization

  • Understanding and creating visual representations of data

  • R is one of our tools and

  • ggplot2 is THE package

  • gg means “Grammar of Graphics”

  • The Grammar of Graphics was written by Leland Wilkinson in 1999.

  • Wilkinson detailed a comprehensive framework to describe and build a wide range of statistical graphics by breaking down the elements of graphics into a unified system.

Grammar of Graphics

Data

  • In GG, data is the foundation
  • All graphical representation decisions rest on top of it
  • All the decisions are data-driven
  • Result: meaningful and accurate visual representation of data
  • AND a wide variety of graphics

Aesthetics (Aesthetic Mappings)

  • These are the rules
  • They map (connect) data variables to the visual properties of the graphical elements
  • Most frequently - x, y, color, size, and shape
  • The connection of data to graphical elements allow us to see/interpret patterns, trends, and anomalies within the data

Geometries include

  • Points, typically used for individual values

  • Lines, connect points in some defined order (e.g., time, location)

  • Bars, typically used for categories (e.g., pizza types)

  • Polygons, use to represent areas (maps)

  • Paths, fatter lines, used to represent flows (Sankey)

from the ggplot2 cheatsheet

Facets and Layers

  • Conditioning the data

  • Subsets are facets

  • Layers combine multiple types of visualizations into a single graphic

Statistics

  • Data can be summarized or modified prior to visualization
    • Aggregation (like sums or means),
    • smoothing
  • Can clarify the underlying structure or relationships in data.

Coordinates and scales

  • Scaling (linear, logarithmic, etc.)

  • Coordinate systems (Cartesian, polar, etc.).

  • They influence the position of data points (and the interpretation).

Themes

  • Control the overall “look” (styling) of the graphic
  • fonts, colors
  • labeling, legends, captioning
  • layout etc.

Activity 2 - Team ggplot